Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 32-bit decoding with large dictionary #3472

Merged
merged 1 commit into from
Feb 2, 2023

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Feb 2, 2023

The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:

  1. Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding.
  2. If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz

@terrelln terrelln force-pushed the 2023-02-01-fix-32-bit-decoding branch 3 times, most recently from 4e7a209 to cc3e3ac Compare February 2, 2023 01:13
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
@Cyan4973 Cyan4973 merged commit c22c995 into facebook:dev Feb 2, 2023
@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 2, 2023

arf, I just meant to approve the PR, but got it merged instead...
Anyway, I believe it's good to go.

@terrelln
Copy link
Contributor Author

terrelln commented Feb 2, 2023

arf, I just meant to approve the PR, but got it merged instead...

No worries! Its a good thing, because that means today's fuzzer builds picked up the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants